Reinforcement Learning in 2-players Games

نویسندگان

  • Kazuteru Miyazaki
  • Sougo Tsuboi
  • Shigenobu Kobayashi
چکیده

The purpose of reinforcement learning system is to learn an optimal policy in general. However, in 2players games such as the othello game, it is important to acquire a penalty avoiding policy. In this paper, we are focused on formation of penalty avoiding policies based on the Penalty Avoiding Rational Policy Making algorithm [2]. In applying it to large-scale problems, we are confronted with the curse of dimensionality. To overcome it in 2-players games, we introduce several ideas and heuristics. We show that our learning player can always defeat against the well-known othello game program KITTY.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating the Experience-Weighted Attractions for the Migration-Emission Game

Players are unlikely to immediately play equilibrium strategies in complicated games or in games in which they do not have much experience playing. In these cases, players will need to learn to play equilibrium strategies. In laboratory experiments, subjects show systematic patterns of learning during a game. In psychological and economic models of learning, players tend to play a strategy more...

متن کامل

On the convergence of reinforcement learning

This paper examines the convergence of payoffs and strategies in Erev and Roth’s model of reinforcement learning.When all players use this rule it eliminates iteratively dominated strategies and in two-person constant-sum games average payoffs converge to the value of the game. Strategies converge in constant-sum games with unique equilibria if they are pure or if they are mixed and the game is...

متن کامل

Collective Learning in Games through Social Networks

This paper argues that combining social networks communication and games can positively influence the learning behavior of players. We propose a computational model that combines features of social network learning (communication) and gamebased learning (strategy reinforcement). The focus is on cooperative games, in which a coalition of players tries to achieve a common goal. We show that enric...

متن کامل

Inventing New Signals

A model of inventing new signals is introduced in the context sender-receiver games with reinforcement learning. If the invention parameter is set to zero, it reduces to basic Roth-Erev learning applied to acts rather than strategies, as in Argiento et. al. (2009). If every act is uniformly reinforced in every state it reduces to the Chinese Restaurant Process also known as the Hoppe-Pólya urn ...

متن کامل

Individual Di®erences in EWA Learning with Partial Payo® Information

We extend EWA learning to games in which only the set of possible foregone payo®s from unchosen strategies are known. We assume players estimate unknown foregone payo®s from a strategy, by substituting the last payo® actually received from that strategy, or by clairvoyantly guessing the actual foregone payo®. Either assumption improves predictive accuracy of EWA. Learning parameters are also es...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004